Measuring Stability of Feature Selection in Biomedical Datasets
نویسندگان
چکیده
An important step in the analysis of high-dimensional biomedical data is feature selection. Typically, a feature subset selected by a feature selection method is evaluated for relevance towards a task such as prediction or classification. Another important property of a feature selection method is stability that refers to robustness of the selected features to perturbations in the data. In biomarker discovery, for example, domain experts prefer a parsimonious subset of features that are relatively robust to slight changes in the data. We present a stability measure called the adjusted stability measure that computes robustness of a feature selection method with respect to random feature selection. This measure is useful for comparing the robustness of feature selection methods and is superior to similar measures that do not account for random feature selection. We demonstrate the application of this measure on a biomedical dataset.
منابع مشابه
Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملFeature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets
Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we ...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملBridging the semantic gap for software effort estimation by hierarchical feature selection techniques
Software project management is one of the significant activates in the software development process. Software Development Effort Estimation (SDEE) is a challenging task in the software project management. SDEE is an old activity in computer industry from 1940s and has been reviewed several times. A SDEE model is appropriate if it provides the accuracy and confidence simultaneously before softwa...
متن کاملEvaluation of Mutual information versus Gini index for stable feature selection
The selection of highly discriminatory features has been crucial in aiding further advancements in domains such as biomedical sciences, high-energy physics and e-commerce. Therefore evaluation of the robustness of feature selection methods to small perturbations in the data, known as feature selection stability, is of great importance to people in these respective fields. However, little resear...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- AMIA ... Annual Symposium proceedings. AMIA Symposium
دوره 2009 شماره
صفحات -
تاریخ انتشار 2009